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CN In this paper we propose a model with a Dirichlet process mixture of gamma 

^ densities in the bulk part below threshold and a generalized Pareto density in the tail 

J for extreme value estimation. The proposed model is simple and flexible allowing us 

^ posterior density estimation and posterior inference for high quantiles. The model 

• works well even for small sample sizes and in the absence of prior information. 

^ We evaluate the performance of the proposed model through a simulation study. 

^ ^^ Finally, the proposed model is applied to a real environmental data. 

^~~^ keywords Generalized Pareto Distribution, Threshold Estimation, Dirichlet Process 

\^ Mixture. 

in 

P 1 Introduction 

(^ In recent years extreme value mixture models have been proposed as a combination of 

^ a distribution with a "bulk part" below threshold and a generalized Pareto distribution 

t> (GPD) in the tail. Different distributions have been proposed for modelling the "bulk 

S^ part" where the threshold is a parameter to be estimated. The first approach which allow 

H us a transition between the bulk and tail parts is provided by Frigessi, Haug & Harvard 

(2003). Frigessi et al. (2003) uses a WeibuU distribution in the bulk part, a GPD for 

the tail and the location-scale Cauchy cdf in the transition function and the authors use 

maximum likelihood estimation. However in the Frigessi et al. (2003) approach maximum 

likelihood estimation in the bulk part could produce multiple modes and hence some iden- 

tifiability problems. Behrens, Lopez & Gammerman (2004) and Carreu & Bengio (2009) 

consider Gamma and Normal distributions respectively in the bulk part. But an unimodal 
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distribution is not realistic in practice where the density has different unknown shapes in 
many apphcations. do Nacimiento, Gammerman & Freitas (2011) use Bayesian inference 
in the bulk part following the proposal of Wiper, Insua & Ruggeri (2001) who propose 
to assign prior probabilities on the number of components of the mixture of gammas and 
to use the reversible jump algorithm for posterior inference purposes. The authors use 
BIG and DIG criteria for model comparison on a fixed number of gamma components. 
This approach allow us to have a flexible model with multimodality in the bulk distri- 
bution. Also, do Nacimiento et al. (2011) show that using posterior predictive inference 
the discontinuity problem at the threshold is eliminated. MacDonald, Scarrot, Lee, B., 
Reale & Rusell (2011) et al propose a non-parametric approach in the bulk part with 
kernel bandwidth estimators and a GPD in the tail where Bayesian inference is applied. 
For a more exhaustive discussion of extreme value threshold estimation see for exam- 
ple Scarrot & MacDonnald (2012). On the other hand, there is an extensive literature 
on Dirichlet mixture process for density estimation particulary using gaussian distribu- 
tions the main paper is given by Escobar & West (1995). The Dirichlet process is very 
flexible, theoretically coherent and simple and in recent years it has been an important 
tool of many applications for Bayesian density estimation (Ferguson (1973) and Anto- 
niak (1974)). Hanson (2006) proposes the Dirichlet process mixture of gamma densities 
(DPMG) for density estimation of univariate densities on the positive real line. 

In this paper we propose a model with a DPMG below threshold and a GPD in the 
tail. We have important reasons for using the proposed model: First, because DPMG 
could be a powerful tool for density estimation in the bulk part (allow us accommodate 
a very wide variety of shapes and spreads in the bulk part) the tail fit is expected to be 
adequate. Second, the proposed model can be used in the absence of prior information. 
Third, Dirichlet Process Mixture controls the expected number of components (Antoniak 
(1974)) therefore the extensive task for model comparison purposes using BIG and DIG 
criteria on a fixed number of gamma components in the bulk part is not necessary. In 
addition, because DPMG is random we can build credible intervals of the posterior density 
in the bulk part. This paper is organized as follows. Section 2 is devoted to present the 
proposed model. In Section 3 we present a simulation study of the proposed model. In 
Section 4 the proposed model is applied to a real environmental data. Finally in Section 
5 we have the conclusions. 



2 Model 

The density of the Generahzed Pareto Distribution with scale parameter a and shape 
parameter 7 is as follows: 



g[x\ 



^exp(-(a;-M)/CT) if ^ = 0, 



H{x\e) x<u 
H{u\e) + [l-H{u\0)]G{x\(l)) x>u 
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where the vector of parameters = (^, a, u) and x — 'U>Ofor,^>0 and < a; — -u < 
—(p/^ for ^ < 0. We have that GDP is bounded from below by u, bounded from above by 
u — o"/^ if .^ < and unbounded from above if ,^ > 0. The density of the proposed model 
is the following: 

/(x|0,^) = ^^"'^^ "-" (2) 

^ ' ^ \[l-H{u\eMx\<^) x>u ^ ' 

where = (-u, .^, A), ^ = (A, 7) and H{u\9) denotes the cumulative distribution function 
(CDF) of h{x\9) at u. The cumulative distribution function of (2) is as follows: 

F{x\<p,e)-- 

where G(a;|0) is the CDF 

lim^ ^u+F{x\(f),6) = H{u\6) therefore (3) is continuous at u. 

2.1 The Dirichlet Process Mixture of Gamma densities 

The novel proposal is to use in the bulk part of (2) a DPMG, as follows we have a short 
introduction about the DP. A distribution G on follows a dirichlet process DP{a,Go) 
if, given an arbitrary measurable partition, Bi,B2, ...,Bk of O the joint distribution of 
(G'(5i),G'(52),...,G'(5fc)) is Dirichlet {aGo{Bi),aGo{B2), ...,aGo{Bk)) where G{Bi) and 
GolBi) denote the probability of set (-Bj) under G and Gq respectively (see Ferguson 
(1973)). Here Go is a specific distribution on O and a is a precision parameter. Let 
K{; ,0) he a parameter family of distributions functions (CDF's) indexed hj e 0, with 
associated densities k{; 0). If G is proper we define the mixture distribution 

F{-G) = jK{;,e)G{d0) (4) 

where G{d0) can be interpreted as the conditional distribution of 6 given G. We can 
express (4) as f{-;G) = J k{.;G) differentiating with respect to (.). Due to G is random 
then F{.; G) is random. F{.; G) is the model for the stochastic mechanism corresponding 



to Xi,X2, ■.■,Xn assuming Xi given G are i.i.d. from F{.;G) with the DP structure. In 
this paper we implement the Dirichlet Process Mixture model by using the Polya urn 
scheme (see Escobar & West (1995) and MacEachern (1994)). In DPMG we have mixing 
parameters (Aj, 7j) = associated with each Xj. The model can be expressed in hierarchical 
form as follows: 

XilXi,^ r^ h{xi,6i), i = l,..,n (5) 

9i\G ~ G, i = I, ..,n 

G\a,7]^DP{a,Go),Go = Go{.\v) 
a,r] ~p(a)p(?7) 

here 6i = (Aj,7j) and h{xi,6i) denotes the gamma density with the scale parameter, Aj , 
and the shape parameter, 7^, 

h{xi\\i,'ji) = TT7-^Xi'~^ exp {-'jiXi} Xi>0 (6) 

r(7i) 

We use the approach of Hanson (2006) for go{X, '~f\r]) therefore two independent exponential 
distributions are considered as follows 

fi'o(A, llv) = oa exp(-aAA)a^ exp(-a^7) (7) 

with T] = {ax,a^). The parameters of (7) follow gamma priors ax ~ T{bx,cx) and a^ ~ 
r(6^, c^), where T{a, b) denotes the gamma density with parameters a and b. 

2.2 Priors for the parameters in the generaUzed Pareto distri- 
bution 

Now we present the priors for the threshold u, the scale parameter a and shape parameter 
^ of the GPD. The prior distribution for m is a normal density iV(m„,cr^) as suggested 
Behrens et al. (2004). Castellanos & Cabras (2007) obtain the Jeffrey's non-informative 
prior for {a,^) and the authors show this prior produces proper posterior results. The 
prior is the following: 

p(a,e)oc (7-1(1 +0-^(1 + 20-^/^ (8) 

where ^ > —0.5 and a > 0. According to Castellanos & Cabras (1996) situations were 
^ < —0.5 are very unusual in practice. The posterior distribution on the log-scale using 
the density (2) is then: 
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-(1+0/5' 



(9) 



+ ^og{p{u)p{^)p{a)) 
for ,^ 7^ and 

\og{p{9,^\x)) oc J]log(/i(x|^)) + ^log ( (1 - H{ti\9))-exp{-{x - u)/(t) J (10) 

A B ^ ^ ^ 

+ log(p(M)p(Op(^)) 

for .^ = 0. With A = {xi : Xi < u} and B = {xi : Xi > u}. Using the proposed model we 
can compute high quantiles befow threshold. In order to find values beyond the threshold 
we have that 

F(x|0, A,7) = H{u\X,^) + [1 - H{u\X,^)]G{x\<j)) (11) 

where G{x\(f)) is the CDF of the GPD. For example to find the p-quantile, q, we use 



and solve G{q\(f)) = p*. 



p 



p- H{u\X,-f) 
l-H{u\X,j) 



(12) 
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Figure 1: Probability density function of the model (3) for a number of parameters values: 
(a) ^ = -0.4 and a = 3, (b) ^ = 0.4 and a = 3, (c) ^ = -0.4 and a = 4 and (d) ^ = 0.4 
and (7 = 4, threshold u = 11 and the center of the densities is a mixture of two gamma 
densities the tails are modelling with GPD. 



Figure 1. displays the density of the proposed model considering different values in 
the parameters. This model allows a discontinuity of the density at the threshold because 
constrains are basically solve defining the adequate models we consider in this paper for 
the posterior analysis in the tail (see do Nacimiento et al. (2011)). 

3 Simulation study 

In this section we evaluate the performance of the proposed model through a simulation 
study. The precision a of g^ in the DP affects the expected number of components in the 
mixture. Hanson (2006) consider values of a fixed to 0.1 and 1 and also random values 
using different assignments of Gamma priors for a such as r(2, 2) and r(2, 0.5). Here we 
consider the precision for DP using a = 0.1. The hyperparameters of go can be expressed 
in terms of the mean /i = A/7 and variance V = A/7^ of h{x\6) (see Hanson (2006)) as 
two diffuse densities f{fi\ax,a^) = axa^/iaxn + a^)"^ and f(y~^\ax,a^) = r(2, ca/x^ + 0-,,//) 
respectively. Suppose now that ax = a^ = 1, so /(/i|l,l) = 1/(1 + /i)^ which is the 
Beta Prime distribution with hyperparamters of scale and shape equals to 1. The Beta 
prime distribution has been used as default density for modelling inference in Bayesian 
analysis. Therefore we can think that we are modelling the mean of the mixture of 
gammas densities in a non informative (but robust) manner. We consider a small sample 
size n = 200. Hanson (2006) obtain an accurate smooth in an univariate density using 
DPMG with different specifications for a and large sample sizes 1000 and 10000. Here, we 
have that ^ = 0.4, o" = 3 and the threshold -u = 11 at the 90% quantile in the simulated 
data. The simulated mixture density for the central part is: 

h{x) =0.5r(a;|10,4) + 0.5r(a;|6,0.7). (13) 

Following Hanson (2006) the hyperparameters for ax and a^ are bx = b^ = cx = c^ = 0.001 
in order to have a non informative gQ. The prior of the threshold u has mean equal to 
90% quantile in the simulated data and the variance o"^ gives 99% of probability in the 
range between 50% and 99% of the simulated data. As usual in the Metropolis algorithm, 
we adjust the variance of the sampling proposal densities considering the hessian of the 
maximum likelihood estimates using some MCMC simulations. We obtained convergence 
of all parameters using 10000 iterations after a burn-in period of 5000 iterations. Figures 
2 displays the quality of the approach even with a small sample size of n = 200. The 
posterior density in the proposed model reproduces the underline density with precision 
according to the credible interval in the bulk part and posterior predictive mean in the tail. 
The density estimation in bulk part of the proposed model could be even better when large 
sample sizes are considered (see Hanson (2006)). Figure 3 displays the posterior densities 
of threshold u, scale a, and shape ^. We can see the posterior distribution represents 
nicely the true parameters. In particular the threshold is centered around the true value 
11. Figures 4 shows as the posterior distributions of the predictive quantiles at 95% is 
accurately estimated. 
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Figure 2: Dashed red line: Posterior predictive density using the Dirichlet process mixture 
of gamma densities in the bulk part and a GPD in the tail. Full black line is the true 
density and the dashed red line is the simulated density. The histogram displays the 
simulated data. The vertical full black line is the true threshold location and the vertical 
dashed red line is the posterior threshold location. 
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Figure 3: Posterior distribution of u, C, and a. 
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Figure 4: Posterior histogram of the 95% quantile for the simulation. Red hne the true 
quantile. 



4 Application to the flow levels in the Gurabo river 

River flow levels are important measures to prevent disasters in populations when flow 
rate exceeds the capacity of the river channel. We applied the proposed model in river 
flow levels measured at cubic feet per second (ft^/s) in Gurabo river at Gurabo Puerto 
Rico. The data is available at waterdata.usgs.gov. The flows are monitoring between 
December 2 2012, 12:00 am to December 4 2012, 8:45 pm. The measures are made each 
15 minutes for a total sample size of n=254. We obtained convergence of all parameters 
using 5000 iterations after a burn-in period of 2000 iterations. 

Figure 5 displays the posterior distributions of the parameters in the tail of the proposed 
model. The threshold, scale and shape are around of 1430 (quantile at 96% according to 
the simulation), 300 and -0.25. Figure 6 shows the posterior distribution for the 99.9% 
high quantile, we can see the maximum value is less than the posterior mean for the 
quantile at 99.9% and the posterior distribution is asymmetric which is expected. Figure 
7 displays the posterior density using DPMG in the bulk part and a GPD in the tail. We 
can see our proposed model reproduces the data in the bulk and tail parts. As a conclusion 
according to the posterior analysis (based on the last two days) with a probability of 0.1% 
we can see values bigger than approximately 1998 ft^/s in the Gurabo River. 
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Figure 5: Posterior histogram of the GPD parameters in the tail of the proposed model 
for the application. 



2000 



4000 



6000 



8000 



Figure 6: Posterior distribution of the 99.9% quantile for the apphcation. Black hue the 
maximum observed data and red hne the posterior mean for the 99.9% simulated quantile. 
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Figure 7: Posterior density using the Dirichlet process mixture of gamma densities in the 
bulk part and posterior predictive distribution using a GPD in the tail. The vertical red 
line is the posterior threshold location. 
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5 Conclusion 

We proposed a model with a Dirichlet process mixture of gamma densities in the bulk part 
of the distribution and a heavy tailed generalized Pareto distribution in the tail for extreme 
value estimation. The proposal is very flexible and simple for density estimation in the 
bulk part and posterior inference in the tail. According to the simulations and application 
to real data the model works well even for small sample sizes and in the absence of prior 
information. The Dirichlet Process mixture controls the expected number of components 
and so the extensive task for model comparison purposes using BIG and DIG criteria on 
a fixed number of gamma components in the bulk part is not necessary. The proposed 
model was applied to a real environmental data set but interesting applications can be 
found in different areas such as clinical trials or finance. 
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A MCMC algorithm 

1. For the bulk part we need to compute h{x\6) and also H{u\6), we consider the polya 
urn expression in the DPMG to compute posterior realizations for the density h{x\9). 
Let {91, ..., 6'**} the unique values of 9i, Ui = j if and only ii 9i = 9* i = 1, 2, , ...,n 
and Uj = \{i : Ui = j}\ and j = 1,2, ...,n* with n* number of distinct values. We 
use the following transition probabihties: 

(a) Polya urn: marginalized G (using ~ to indicate summaries without Wj) and 
defining a specific configuration {ui, ..,Co'„} with transition probabilities: 

p{u, = i\u^,) ^ h ^; = i'-'^*^' (14) 

la J = n +1 

(b) Resampling cluster membership indicators uf. 

v(u;- = i\ x^o.h^^^'"''^*^^ j = l,...,n*-, 

^' '^'■■■' " \afk{xf,ei)dGo{ei\7]) j = n*- + l ^ ' 

where we use the close results in Hanson (2006): 

kix^■,9*) = h{x^\9*) (16) 



k{xi;9i)dGo{9,\fi,T^)= (17) 



Xi{xi + ax){ax - log{xi/{xi + a^)))' 
12 



1^ 



with probability proportional to njk{xi;6*) we make 6i = 6*^. On the other 
hand with probability proportional to a J k{yi;9i,(f))dGo{9i\ri) we open a new 
component and we sample 6i = (Aj,7j). First we sample Xi\ri ~ r{2,ax — 
\o^{xil{xi + a^))^) then we sample ^iW^r] ~ r(Aj + l,Xj + a^). 

2. Now we are interested in to show the sampling for the parameters in the GPD 
defined in the tails of (2). Following do Nacimiento et al. (2011) we compute the 
posterior distribution of -u, ^ and a using three steeps of the Metropolis Hasting 
algorithm. The algorithm is as follow: 

(a) Sampling ^: proposal transition kernel is given by a truncated normal 

Ct ~ iV(r , \/^)/(-aV(M - t.^), oo) (19) 

where V^ is a variance in order to improve the mixing. M is the maximum 
value in the sample the acceptance probability is 

. p(^*,0*|a;)$((e^ + a7(M-iz^))/v/T^) 

a^ = mm < 1, 



p{e\ 0^|x)<l>((e + (r*/{M - U*))/y/V^) 

where is the density function of the standard normal distribution. 

(b) Sampling a: If ^(^+^) > then a* is sampled from the Gamma distribution 
T(a'^^^'' /Vf^, a^ /Va-) where V^ is a variance in order to improve the mixing. On 
the other hand if ^(^+^) < then a* is sampled from a truncated normal 

a*\a^ ~ N{a', V,)I{-i^^+^\M - u^), oo) (20) 

the acceptance probabilities are respectively: 

p{e\ 0*|a;)$((a^ + ^(^+i)(M - M^)/v^) 



an = min < 1 



and 



' p{e\ (i)^\x)<^{{(T* + ^(''+i)(M - u'')/^) 



. .^ p(r,0*|x)r(aV'W/v;,^7K) 

a„ = mm < 1. 



"' p{e\ 0^|a;)r(a*|a2W/V;, (T^^yVn 



(c) The threshold u* is sampled following the requirement of the lower truncation 
for the GPD. Therefore u* is sampled using a truncated normal density 



^* I ^b 
cr cr ~ 



iV(«^ K)/(a('+'^ oo) (21) 
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jf ^(b+1) > Q then a*^^+^) is the minimum value at the sample in the iteration 6+1 
otherwise if ^^^+^^ < a(^+^) = M + a'^b+i) ^ ^(b+i) _ rjnj^g acceptance probability 
is then 



a^ 



mm 



p(r , 0*|x)$((m^ - a^+i)/v^) 
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